164 research outputs found

    Improving Sentiment Analysis with Document-Level Semantic Relationships from Rhetoric Discourse Structures

    Get PDF
    Conventional sentiment analysis usually neglects semantic information between (sub-)clauses, as it merely implements so-called bag-of-words approaches, where the sentiment of individual words is aggregated independently of the document structure. Instead, we advance sentiment analysis by the use of rhetoric structure theory (RST), which provides a hierarchical representation of texts at document level. For this purpose, texts are split into elementary discourse units (EDU). These EDUs span a hierarchical structure in the form of a binary tree, where the branches are labeled according to their semantic discourse. Accordingly, this paper proposes a novel combination of weighting and grid search to aggregate sentiment scores from the RST tree, as well as feature engineering for machine learning. We apply our algorithms to the especially hard task of predicting stock returns subsequent to financial disclosures. As a result, machine learning improves the balanced accuracy by 8.6 percent compared to the baseline

    Generating Dialogues for Virtual Agents Using Nested Textual Coherence Relations

    Get PDF
    This paper describes recent advances on the Text2Dialogue system we are currently developing. Our system enables automatic transformation of monological text into a dialogue. The dialogue is then 'acted out' by virtual agents, using synthetic speech and gestures. In this paper, we focus on the monologue-to-dialogue transformation, and describe how it uses textual coherence relations to map text segments to query–answer pairs between an expert and a layman agent. By creating mapping rules for a few well-selected relations, we can produce coherent dialogues with proper assignment of turns for the speakers in a majority of cases

    HILDA: A Discourse Parser Using Support Vector Machine Classification

    Get PDF
    Discourse structures have a central role in several computational tasks, such as question-answering or dialogue generation. In particular, the framework of the Rhetorical Structure Theory (RST) offers a sound formalism for hierarchical text organization. In this article, we present HILDA, an implemented discourse parser based on RST and Support Vector Machine (SVM) classification. SVM classifiers are trained and applied to discourse segmentation and relation labeling. By combining labeling with a greedy bottom-up tree building approach, we are able to create accurate discourse trees in linear time complexity. Importantly, our parser can parse entire texts, whereas the publicly available parser SPADE (Soricut and Marcu 2003) is limited to sentence level analysis. HILDA outperforms other discourse parsers for tree structure construction and discourse relation labeling. For the discourse parsing task, our system reaches 78.3% of the performance level of human annotators. Compared to a state-of-the-art rule-based discourse parser, our system achieves a performance increase of 11.6%
    • 

    corecore